Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 404 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 47.5 KiB |
| Average record size in memory | 120.3 B |
Variable types
| Numeric | 14 |
|---|---|
| Categorical | 1 |
df_index is highly correlated with RAD and 1 other fields | High correlation |
CRIM is highly correlated with ZN and 8 other fields | High correlation |
ZN is highly correlated with CRIM and 5 other fields | High correlation |
INDUS is highly correlated with CRIM and 7 other fields | High correlation |
NOX is highly correlated with CRIM and 8 other fields | High correlation |
RM is highly correlated with LSTAT and 1 other fields | High correlation |
AGE is highly correlated with CRIM and 7 other fields | High correlation |
DIS is highly correlated with CRIM and 6 other fields | High correlation |
RAD is highly correlated with df_index and 3 other fields | High correlation |
TAX is highly correlated with df_index and 8 other fields | High correlation |
PTRATIO is highly correlated with MEDV | High correlation |
LSTAT is highly correlated with CRIM and 8 other fields | High correlation |
MEDV is highly correlated with CRIM and 7 other fields | High correlation |
df_index is highly correlated with RAD and 1 other fields | High correlation |
CRIM is highly correlated with RAD and 1 other fields | High correlation |
ZN is highly correlated with INDUS and 3 other fields | High correlation |
INDUS is highly correlated with ZN and 6 other fields | High correlation |
NOX is highly correlated with ZN and 6 other fields | High correlation |
RM is highly correlated with LSTAT and 1 other fields | High correlation |
AGE is highly correlated with ZN and 5 other fields | High correlation |
DIS is highly correlated with ZN and 5 other fields | High correlation |
RAD is highly correlated with df_index and 4 other fields | High correlation |
TAX is highly correlated with df_index and 7 other fields | High correlation |
PTRATIO is highly correlated with MEDV | High correlation |
LSTAT is highly correlated with INDUS and 6 other fields | High correlation |
MEDV is highly correlated with RM and 2 other fields | High correlation |
CRIM is highly correlated with INDUS and 4 other fields | High correlation |
ZN is highly correlated with INDUS and 1 other fields | High correlation |
INDUS is highly correlated with CRIM and 3 other fields | High correlation |
NOX is highly correlated with CRIM and 4 other fields | High correlation |
AGE is highly correlated with NOX and 2 other fields | High correlation |
DIS is highly correlated with CRIM and 3 other fields | High correlation |
RAD is highly correlated with CRIM and 1 other fields | High correlation |
TAX is highly correlated with CRIM and 1 other fields | High correlation |
LSTAT is highly correlated with AGE and 1 other fields | High correlation |
MEDV is highly correlated with LSTAT | High correlation |
df_index is highly correlated with ZN and 10 other fields | High correlation |
CRIM is highly correlated with INDUS and 3 other fields | High correlation |
ZN is highly correlated with df_index and 8 other fields | High correlation |
INDUS is highly correlated with df_index and 9 other fields | High correlation |
NOX is highly correlated with df_index and 10 other fields | High correlation |
RM is highly correlated with PTRATIO and 3 other fields | High correlation |
AGE is highly correlated with df_index and 8 other fields | High correlation |
DIS is highly correlated with df_index and 9 other fields | High correlation |
RAD is highly correlated with df_index and 9 other fields | High correlation |
TAX is highly correlated with df_index and 6 other fields | High correlation |
PTRATIO is highly correlated with df_index and 10 other fields | High correlation |
B is highly correlated with df_index and 3 other fields | High correlation |
LSTAT is highly correlated with df_index and 9 other fields | High correlation |
MEDV is highly correlated with df_index and 9 other fields | High correlation |
df_index has unique values | Unique |
ZN has 293 (72.5%) zeros | Zeros |
Reproduction
| Analysis started | 2021-10-30 05:26:53.310058 |
|---|---|
| Analysis finished | 2021-10-30 05:28:01.850588 |
| Duration | 1 minute and 8.54 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 404 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 246.8143564 |
| Minimum | 0 |
|---|---|
| Maximum | 505 |
| Zeros | 1 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 25.15 |
| Q1 | 121.75 |
| median | 245 |
| Q3 | 373.25 |
| 95-th percentile | 473.7 |
| Maximum | 505 |
| Range | 505 |
| Interquartile range (IQR) | 251.5 |
Descriptive statistics
| Standard deviation | 145.3223257 |
|---|---|
| Coefficient of variation (CV) | 0.5887920291 |
| Kurtosis | -1.194203186 |
| Mean | 246.8143564 |
| Median Absolute Deviation (MAD) | 126.5 |
| Skewness | 0.05370653547 |
| Sum | 99713 |
| Variance | 21118.57835 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | 0.2% |
| 322 | 1 | 0.2% |
| 338 | 1 | 0.2% |
| 337 | 1 | 0.2% |
| 336 | 1 | 0.2% |
| 335 | 1 | 0.2% |
| 333 | 1 | 0.2% |
| 332 | 1 | 0.2% |
| 331 | 1 | 0.2% |
| 328 | 1 | 0.2% |
| Other values (394) | 394 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 13 | 1 |
| Value | Count | Frequency (%) |
| 505 | 1 | |
| 504 | 1 | |
| 502 | 1 | |
| 500 | 1 | |
| 499 | 1 | |
| 498 | 1 | |
| 496 | 1 | |
| 495 | 1 | |
| 493 | 1 | |
| 492 | 1 |
| Distinct | 402 |
|---|---|
| Distinct (%) | 99.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.69745495 |
| Minimum | 0.00632 |
|---|---|
| Maximum | 88.9762 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0.00632 |
|---|---|
| 5-th percentile | 0.027358 |
| Q1 | 0.0825975 |
| median | 0.234405 |
| Q3 | 3.5949275 |
| 95-th percentile | 16.67119 |
| Maximum | 88.9762 |
| Range | 88.96988 |
| Interquartile range (IQR) | 3.51233 |
Descriptive statistics
| Standard deviation | 9.146742709 |
|---|---|
| Coefficient of variation (CV) | 2.473794226 |
| Kurtosis | 35.63375544 |
| Mean | 3.69745495 |
| Median Absolute Deviation (MAD) | 0.199565 |
| Skewness | 5.227843948 |
| Sum | 1493.7718 |
| Variance | 83.66290219 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.01501 | 2 | 0.5% |
| 14.3337 | 2 | 0.5% |
| 0.57834 | 1 | 0.2% |
| 0.03548 | 1 | 0.2% |
| 0.95577 | 1 | 0.2% |
| 0.04113 | 1 | 0.2% |
| 0.03537 | 1 | 0.2% |
| 1.38799 | 1 | 0.2% |
| 0.52014 | 1 | 0.2% |
| 0.97617 | 1 | 0.2% |
| Other values (392) | 392 |
| Value | Count | Frequency (%) |
| 0.00632 | 1 | |
| 0.00906 | 1 | |
| 0.01301 | 1 | |
| 0.01311 | 1 | |
| 0.0136 | 1 | |
| 0.01381 | 1 | |
| 0.01432 | 1 | |
| 0.01439 | 1 | |
| 0.01501 | 2 | |
| 0.01538 | 1 |
| Value | Count | Frequency (%) |
| 88.9762 | 1 | |
| 73.5341 | 1 | |
| 67.9208 | 1 | |
| 51.1358 | 1 | |
| 45.7461 | 1 | |
| 41.5292 | 1 | |
| 38.3518 | 1 | |
| 28.6558 | 1 | |
| 25.9406 | 1 | |
| 25.0461 | 1 |
| Distinct | 25 |
|---|---|
| Distinct (%) | 6.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.52722772 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 293 |
| Zeros (%) | 72.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 20 |
| 95-th percentile | 80 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 23.28828396 |
|---|---|
| Coefficient of variation (CV) | 2.020284888 |
| Kurtosis | 4.15345176 |
| Mean | 11.52722772 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.234782342 |
| Sum | 4657 |
| Variance | 542.34417 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=25)
| Value | Count | Frequency (%) |
| 0 | 293 | |
| 20 | 19 | 4.7% |
| 80 | 11 | 2.7% |
| 25 | 9 | 2.2% |
| 22 | 9 | 2.2% |
| 12.5 | 7 | 1.7% |
| 40 | 6 | 1.5% |
| 30 | 5 | 1.2% |
| 90 | 5 | 1.2% |
| 45 | 5 | 1.2% |
| Other values (15) | 35 | 8.7% |
| Value | Count | Frequency (%) |
| 0 | 293 | |
| 12.5 | 7 | 1.7% |
| 17.5 | 1 | 0.2% |
| 18 | 1 | 0.2% |
| 20 | 19 | 4.7% |
| 21 | 4 | 1.0% |
| 22 | 9 | 2.2% |
| 25 | 9 | 2.2% |
| 28 | 2 | 0.5% |
| 30 | 5 | 1.2% |
| Value | Count | Frequency (%) |
| 100 | 1 | 0.2% |
| 95 | 3 | 0.7% |
| 90 | 5 | |
| 85 | 2 | 0.5% |
| 82.5 | 1 | 0.2% |
| 80 | 11 | |
| 75 | 3 | 0.7% |
| 70 | 2 | 0.5% |
| 60 | 3 | 0.7% |
| 52.5 | 3 | 0.7% |
| Distinct | 72 |
|---|---|
| Distinct (%) | 17.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.0775 |
| Minimum | 0.46 |
|---|---|
| Maximum | 27.74 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0.46 |
|---|---|
| 5-th percentile | 2.18 |
| Q1 | 5.19 |
| median | 9.125 |
| Q3 | 18.1 |
| 95-th percentile | 21.89 |
| Maximum | 27.74 |
| Range | 27.28 |
| Interquartile range (IQR) | 12.91 |
Descriptive statistics
| Standard deviation | 6.848411921 |
|---|---|
| Coefficient of variation (CV) | 0.6182272102 |
| Kurtosis | -1.232672408 |
| Mean | 11.0775 |
| Median Absolute Deviation (MAD) | 5.715 |
| Skewness | 0.3126345632 |
| Sum | 4475.31 |
| Variance | 46.90074584 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 18.1 | 104 | |
| 19.58 | 23 | 5.7% |
| 8.14 | 18 | 4.5% |
| 6.2 | 13 | 3.2% |
| 21.89 | 12 | 3.0% |
| 3.97 | 10 | 2.5% |
| 10.59 | 10 | 2.5% |
| 5.86 | 9 | 2.2% |
| 6.91 | 8 | 2.0% |
| 8.56 | 8 | 2.0% |
| Other values (62) | 189 |
| Value | Count | Frequency (%) |
| 0.46 | 1 | 0.2% |
| 0.74 | 1 | 0.2% |
| 1.21 | 1 | 0.2% |
| 1.22 | 1 | 0.2% |
| 1.25 | 1 | 0.2% |
| 1.32 | 1 | 0.2% |
| 1.38 | 1 | 0.2% |
| 1.47 | 2 | |
| 1.52 | 4 | |
| 1.69 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 27.74 | 3 | 0.7% |
| 25.65 | 7 | 1.7% |
| 21.89 | 12 | 3.0% |
| 19.58 | 23 | 5.7% |
| 18.1 | 104 | |
| 15.04 | 2 | 0.5% |
| 13.92 | 4 | 1.0% |
| 13.89 | 3 | 0.7% |
| 12.83 | 6 | 1.5% |
| 11.93 | 3 | 0.7% |
CHAS
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.3 KiB |
| 0.0 | |
|---|---|
| 1.0 | 32 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 372 | |
| 1.0 | 32 | 7.9% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0.0 | 372 | |
| 1.0 | 32 | 7.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 76 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5530262376 |
| Minimum | 0.385 |
|---|---|
| Maximum | 0.871 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0.385 |
|---|---|
| 5-th percentile | 0.40915 |
| Q1 | 0.448 |
| median | 0.535 |
| Q3 | 0.624 |
| 95-th percentile | 0.74 |
| Maximum | 0.871 |
| Range | 0.486 |
| Interquartile range (IQR) | 0.176 |
Descriptive statistics
| Standard deviation | 0.1168946844 |
|---|---|
| Coefficient of variation (CV) | 0.2113727641 |
| Kurtosis | -0.07277225706 |
| Mean | 0.5530262376 |
| Median Absolute Deviation (MAD) | 0.089 |
| Skewness | 0.7496713633 |
| Sum | 223.4226 |
| Variance | 0.01366436725 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.538 | 19 | 4.7% |
| 0.713 | 17 | 4.2% |
| 0.437 | 15 | 3.7% |
| 0.871 | 13 | 3.2% |
| 0.489 | 13 | 3.2% |
| 0.624 | 12 | 3.0% |
| 0.693 | 11 | 2.7% |
| 0.605 | 10 | 2.5% |
| 0.74 | 9 | 2.2% |
| 0.7 | 9 | 2.2% |
| Other values (66) | 276 |
| Value | Count | Frequency (%) |
| 0.385 | 1 | 0.2% |
| 0.392 | 2 | |
| 0.394 | 1 | 0.2% |
| 0.4 | 3 | |
| 0.401 | 3 | |
| 0.403 | 3 | |
| 0.404 | 3 | |
| 0.405 | 3 | |
| 0.409 | 2 | |
| 0.41 | 3 |
| Value | Count | Frequency (%) |
| 0.871 | 13 | |
| 0.77 | 7 | |
| 0.74 | 9 | |
| 0.718 | 3 | 0.7% |
| 0.713 | 17 | |
| 0.7 | 9 | |
| 0.693 | 11 | |
| 0.679 | 6 | 1.5% |
| 0.671 | 6 | 1.5% |
| 0.668 | 3 | 0.7% |
| Distinct | 366 |
|---|---|
| Distinct (%) | 90.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.268792079 |
| Minimum | 3.561 |
|---|---|
| Maximum | 8.78 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 3.561 |
|---|---|
| 5-th percentile | 5.31 |
| Q1 | 5.87675 |
| median | 6.179 |
| Q3 | 6.6265 |
| 95-th percentile | 7.4676 |
| Maximum | 8.78 |
| Range | 5.219 |
| Interquartile range (IQR) | 0.74975 |
Descriptive statistics
| Standard deviation | 0.6892286632 |
|---|---|
| Coefficient of variation (CV) | 0.1099460079 |
| Kurtosis | 1.676887523 |
| Mean | 6.268792079 |
| Median Absolute Deviation (MAD) | 0.3405 |
| Skewness | 0.2821905315 |
| Sum | 2532.592 |
| Variance | 0.4750361502 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6.167 | 3 | 0.7% |
| 6.229 | 3 | 0.7% |
| 5.713 | 3 | 0.7% |
| 6.405 | 2 | 0.5% |
| 6.315 | 2 | 0.5% |
| 6.951 | 2 | 0.5% |
| 6.127 | 2 | 0.5% |
| 6.727 | 2 | 0.5% |
| 6.193 | 2 | 0.5% |
| 6.376 | 2 | 0.5% |
| Other values (356) | 381 |
| Value | Count | Frequency (%) |
| 3.561 | 1 | |
| 3.863 | 1 | |
| 4.138 | 1 | |
| 4.368 | 1 | |
| 4.519 | 1 | |
| 4.628 | 1 | |
| 4.88 | 1 | |
| 4.903 | 1 | |
| 4.906 | 1 | |
| 4.97 | 1 |
| Value | Count | Frequency (%) |
| 8.78 | 1 | |
| 8.398 | 1 | |
| 8.375 | 1 | |
| 8.297 | 1 | |
| 8.259 | 1 | |
| 8.069 | 1 | |
| 8.04 | 1 | |
| 7.929 | 1 | |
| 7.923 | 1 | |
| 7.875 | 1 |
| Distinct | 302 |
|---|---|
| Distinct (%) | 74.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 67.93564356 |
| Minimum | 2.9 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 2.9 |
|---|---|
| 5-th percentile | 17.5 |
| Q1 | 43.25 |
| median | 76.8 |
| Q3 | 93.825 |
| 95-th percentile | 100 |
| Maximum | 100 |
| Range | 97.1 |
| Interquartile range (IQR) | 50.575 |
Descriptive statistics
| Standard deviation | 28.56318609 |
|---|---|
| Coefficient of variation (CV) | 0.4204447709 |
| Kurtosis | -1.009870706 |
| Mean | 67.93564356 |
| Median Absolute Deviation (MAD) | 20.2 |
| Skewness | -0.5791546473 |
| Sum | 27446 |
| Variance | 815.8555998 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 35 | 8.7% |
| 87.9 | 4 | 1.0% |
| 96 | 4 | 1.0% |
| 95.6 | 3 | 0.7% |
| 97.3 | 3 | 0.7% |
| 97.9 | 3 | 0.7% |
| 95.4 | 3 | 0.7% |
| 97 | 3 | 0.7% |
| 36.6 | 3 | 0.7% |
| 94.1 | 3 | 0.7% |
| Other values (292) | 340 |
| Value | Count | Frequency (%) |
| 2.9 | 1 | |
| 6 | 1 | |
| 6.2 | 1 | |
| 6.5 | 1 | |
| 6.6 | 2 | |
| 6.8 | 1 | |
| 7.8 | 2 | |
| 8.4 | 1 | |
| 8.9 | 1 | |
| 9.9 | 1 |
| Value | Count | Frequency (%) |
| 100 | 35 | |
| 99.3 | 1 | 0.2% |
| 99.1 | 1 | 0.2% |
| 98.9 | 1 | 0.2% |
| 98.8 | 2 | 0.5% |
| 98.7 | 1 | 0.2% |
| 98.5 | 1 | 0.2% |
| 98.4 | 2 | 0.5% |
| 98.3 | 1 | 0.2% |
| 98.2 | 2 | 0.5% |
| Distinct | 339 |
|---|---|
| Distinct (%) | 83.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.826110644 |
| Minimum | 1.1296 |
|---|---|
| Maximum | 12.1265 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 1.1296 |
|---|---|
| 5-th percentile | 1.45632 |
| Q1 | 2.10535 |
| median | 3.2986 |
| Q3 | 5.141475 |
| 95-th percentile | 7.935835 |
| Maximum | 12.1265 |
| Range | 10.9969 |
| Interquartile range (IQR) | 3.036125 |
Descriptive statistics
| Standard deviation | 2.120998991 |
|---|---|
| Coefficient of variation (CV) | 0.5543485771 |
| Kurtosis | 0.527037588 |
| Mean | 3.826110644 |
| Median Absolute Deviation (MAD) | 1.33105 |
| Skewness | 1.018401402 |
| Sum | 1545.7487 |
| Variance | 4.498636721 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.4952 | 5 | 1.2% |
| 5.7209 | 4 | 1.0% |
| 5.2873 | 4 | 1.0% |
| 6.8147 | 4 | 1.0% |
| 5.4007 | 4 | 1.0% |
| 7.3172 | 3 | 0.7% |
| 6.4798 | 3 | 0.7% |
| 7.309 | 3 | 0.7% |
| 4.8122 | 3 | 0.7% |
| 3.9454 | 3 | 0.7% |
| Other values (329) | 368 |
| Value | Count | Frequency (%) |
| 1.1296 | 1 | |
| 1.137 | 1 | |
| 1.1691 | 1 | |
| 1.1742 | 1 | |
| 1.3163 | 1 | |
| 1.3216 | 1 | |
| 1.3325 | 1 | |
| 1.3459 | 1 | |
| 1.3567 | 1 | |
| 1.358 | 1 |
| Value | Count | Frequency (%) |
| 12.1265 | 1 | |
| 10.7103 | 1 | |
| 10.5857 | 2 | |
| 9.2229 | 1 | |
| 9.2203 | 2 | |
| 9.1876 | 1 | |
| 9.0892 | 1 | |
| 8.9067 | 2 | |
| 8.7921 | 1 | |
| 8.6966 | 1 |
| Distinct | 9 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.47029703 |
| Minimum | 1 |
|---|---|
| Maximum | 24 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 4 |
| median | 5 |
| Q3 | 24 |
| 95-th percentile | 24 |
| Maximum | 24 |
| Range | 23 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 8.680236753 |
|---|---|
| Coefficient of variation (CV) | 0.9165749211 |
| Kurtosis | -0.8248875946 |
| Mean | 9.47029703 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 1.02475827 |
| Sum | 3826 |
| Variance | 75.34651009 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=9)
| Value | Count | Frequency (%) |
| 24 | 104 | |
| 5 | 92 | |
| 4 | 85 | |
| 3 | 32 | 7.9% |
| 2 | 22 | 5.4% |
| 6 | 21 | 5.2% |
| 8 | 18 | 4.5% |
| 7 | 15 | 3.7% |
| 1 | 15 | 3.7% |
| Value | Count | Frequency (%) |
| 1 | 15 | 3.7% |
| 2 | 22 | 5.4% |
| 3 | 32 | 7.9% |
| 4 | 85 | |
| 5 | 92 | |
| 6 | 21 | 5.2% |
| 7 | 15 | 3.7% |
| 8 | 18 | 4.5% |
| 24 | 104 |
| Value | Count | Frequency (%) |
| 24 | 104 | |
| 8 | 18 | 4.5% |
| 7 | 15 | 3.7% |
| 6 | 21 | 5.2% |
| 5 | 92 | |
| 4 | 85 | |
| 3 | 32 | 7.9% |
| 2 | 22 | 5.4% |
| 1 | 15 | 3.7% |
| Distinct | 63 |
|---|---|
| Distinct (%) | 15.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 403.2574257 |
| Minimum | 187 |
|---|---|
| Maximum | 711 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 187 |
|---|---|
| 5-th percentile | 216.9 |
| Q1 | 277 |
| median | 329 |
| Q3 | 666 |
| 95-th percentile | 666 |
| Maximum | 711 |
| Range | 524 |
| Interquartile range (IQR) | 389 |
Descriptive statistics
| Standard deviation | 169.0304804 |
|---|---|
| Coefficient of variation (CV) | 0.4191627223 |
| Kurtosis | -1.100215829 |
| Mean | 403.2574257 |
| Median Absolute Deviation (MAD) | 74 |
| Skewness | 0.7026853169 |
| Sum | 162916 |
| Variance | 28571.30329 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 666 | 104 | |
| 307 | 31 | 7.7% |
| 403 | 23 | 5.7% |
| 437 | 12 | 3.0% |
| 398 | 11 | 2.7% |
| 277 | 10 | 2.5% |
| 304 | 10 | 2.5% |
| 264 | 10 | 2.5% |
| 330 | 9 | 2.2% |
| 384 | 8 | 2.0% |
| Other values (53) | 176 |
| Value | Count | Frequency (%) |
| 187 | 1 | 0.2% |
| 188 | 7 | |
| 193 | 7 | |
| 198 | 1 | 0.2% |
| 216 | 5 | |
| 222 | 4 | |
| 223 | 5 | |
| 224 | 8 | |
| 226 | 1 | 0.2% |
| 233 | 8 |
| Value | Count | Frequency (%) |
| 711 | 3 | 0.7% |
| 666 | 104 | |
| 469 | 1 | 0.2% |
| 437 | 12 | 3.0% |
| 432 | 7 | 1.7% |
| 430 | 1 | 0.2% |
| 411 | 1 | 0.2% |
| 403 | 23 | 5.7% |
| 402 | 2 | 0.5% |
| 398 | 11 | 2.7% |
| Distinct | 46 |
|---|---|
| Distinct (%) | 11.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 18.43861386 |
| Minimum | 12.6 |
|---|---|
| Maximum | 22 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 12.6 |
|---|---|
| 5-th percentile | 14.7 |
| Q1 | 17.225 |
| median | 19 |
| Q3 | 20.2 |
| 95-th percentile | 21 |
| Maximum | 22 |
| Range | 9.4 |
| Interquartile range (IQR) | 2.975 |
Descriptive statistics
| Standard deviation | 2.169468737 |
|---|---|
| Coefficient of variation (CV) | 0.1176589929 |
| Kurtosis | -0.2321044426 |
| Mean | 18.43861386 |
| Median Absolute Deviation (MAD) | 1.2 |
| Skewness | -0.8066332559 |
| Sum | 7449.2 |
| Variance | 4.7065946 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=46)
| Value | Count | Frequency (%) |
| 20.2 | 111 | |
| 14.7 | 25 | 6.2% |
| 21 | 21 | 5.2% |
| 17.8 | 19 | 4.7% |
| 18.6 | 16 | 4.0% |
| 19.1 | 16 | 4.0% |
| 19.2 | 15 | 3.7% |
| 16.6 | 15 | 3.7% |
| 17.4 | 13 | 3.2% |
| 21.2 | 12 | 3.0% |
| Other values (36) | 141 |
| Value | Count | Frequency (%) |
| 12.6 | 3 | 0.7% |
| 13 | 10 | 2.5% |
| 13.6 | 1 | 0.2% |
| 14.4 | 1 | 0.2% |
| 14.7 | 25 | |
| 14.8 | 2 | 0.5% |
| 14.9 | 4 | 1.0% |
| 15.1 | 1 | 0.2% |
| 15.2 | 10 | 2.5% |
| 15.3 | 2 | 0.5% |
| Value | Count | Frequency (%) |
| 22 | 2 | 0.5% |
| 21.2 | 12 | 3.0% |
| 21.1 | 1 | 0.2% |
| 21 | 21 | 5.2% |
| 20.9 | 8 | 2.0% |
| 20.2 | 111 | |
| 20.1 | 3 | 0.7% |
| 19.7 | 6 | 1.5% |
| 19.6 | 5 | 1.2% |
| 19.2 | 15 | 3.7% |
| Distinct | 287 |
|---|---|
| Distinct (%) | 71.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 357.1536881 |
| Minimum | 0.32 |
|---|---|
| Maximum | 396.9 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 0.32 |
|---|---|
| 5-th percentile | 88.049 |
| Q1 | 376.0925 |
| median | 391.575 |
| Q3 | 396.1575 |
| 95-th percentile | 396.9 |
| Maximum | 396.9 |
| Range | 396.58 |
| Interquartile range (IQR) | 20.065 |
Descriptive statistics
| Standard deviation | 91.5416474 |
|---|---|
| Coefficient of variation (CV) | 0.2563088397 |
| Kurtosis | 7.383862826 |
| Mean | 357.1536881 |
| Median Absolute Deviation (MAD) | 5.325 |
| Skewness | -2.921604753 |
| Sum | 144290.09 |
| Variance | 8379.873208 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 396.9 | 96 | 23.8% |
| 395.24 | 3 | 0.7% |
| 393.74 | 3 | 0.7% |
| 341.6 | 2 | 0.5% |
| 395.62 | 2 | 0.5% |
| 377.07 | 2 | 0.5% |
| 376.14 | 2 | 0.5% |
| 393.23 | 2 | 0.5% |
| 395.58 | 2 | 0.5% |
| 392.78 | 2 | 0.5% |
| Other values (277) | 288 |
| Value | Count | Frequency (%) |
| 0.32 | 1 | |
| 2.52 | 1 | |
| 2.6 | 1 | |
| 3.5 | 1 | |
| 3.65 | 1 | |
| 6.68 | 1 | |
| 9.32 | 1 | |
| 10.48 | 1 | |
| 16.45 | 1 | |
| 21.57 | 1 |
| Value | Count | Frequency (%) |
| 396.9 | 96 | |
| 396.33 | 1 | 0.2% |
| 396.28 | 1 | 0.2% |
| 396.24 | 1 | 0.2% |
| 396.23 | 1 | 0.2% |
| 396.21 | 1 | 0.2% |
| 396.14 | 1 | 0.2% |
| 396.06 | 2 | 0.5% |
| 395.99 | 1 | 0.2% |
| 395.93 | 1 | 0.2% |
| Distinct | 366 |
|---|---|
| Distinct (%) | 90.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 12.7785396 |
| Minimum | 1.73 |
|---|---|
| Maximum | 37.97 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 1.73 |
|---|---|
| 5-th percentile | 3.7045 |
| Q1 | 7.0925 |
| median | 11.465 |
| Q3 | 17.1025 |
| 95-th percentile | 26.8125 |
| Maximum | 37.97 |
| Range | 36.24 |
| Interquartile range (IQR) | 10.01 |
Descriptive statistics
| Standard deviation | 7.216402642 |
|---|---|
| Coefficient of variation (CV) | 0.5647282761 |
| Kurtosis | 0.5613141428 |
| Mean | 12.7785396 |
| Median Absolute Deviation (MAD) | 4.82 |
| Skewness | 0.9211819112 |
| Sum | 5162.53 |
| Variance | 52.07646709 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 3 | 0.7% |
| 7.79 | 3 | 0.7% |
| 6.36 | 3 | 0.7% |
| 18.13 | 3 | 0.7% |
| 15.17 | 2 | 0.5% |
| 23.98 | 2 | 0.5% |
| 13.15 | 2 | 0.5% |
| 10.11 | 2 | 0.5% |
| 3.76 | 2 | 0.5% |
| 14.81 | 2 | 0.5% |
| Other values (356) | 380 |
| Value | Count | Frequency (%) |
| 1.73 | 1 | |
| 1.98 | 1 | |
| 2.87 | 1 | |
| 2.94 | 1 | |
| 2.97 | 1 | |
| 2.98 | 1 | |
| 3.01 | 1 | |
| 3.11 | 2 | |
| 3.13 | 1 | |
| 3.16 | 2 |
| Value | Count | Frequency (%) |
| 37.97 | 1 | |
| 36.98 | 1 | |
| 34.77 | 1 | |
| 34.41 | 1 | |
| 34.37 | 1 | |
| 34.02 | 1 | |
| 31.99 | 1 | |
| 30.81 | 2 | |
| 30.63 | 1 | |
| 30.62 | 1 |
| Distinct | 210 |
|---|---|
| Distinct (%) | 52.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.52227723 |
| Minimum | 5 |
|---|---|
| Maximum | 50 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 3.3 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 10.23 |
| Q1 | 17.175 |
| median | 21.05 |
| Q3 | 25.225 |
| 95-th percentile | 42.15 |
| Maximum | 50 |
| Range | 45 |
| Interquartile range (IQR) | 8.05 |
Descriptive statistics
| Standard deviation | 8.998990777 |
|---|---|
| Coefficient of variation (CV) | 0.3995595421 |
| Kurtosis | 1.469384182 |
| Mean | 22.52227723 |
| Median Absolute Deviation (MAD) | 3.95 |
| Skewness | 1.056975148 |
| Sum | 9099 |
| Variance | 80.981835 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 50 | 11 | 2.7% |
| 23.1 | 7 | 1.7% |
| 25 | 6 | 1.5% |
| 20.6 | 6 | 1.5% |
| 21.2 | 5 | 1.2% |
| 17.8 | 5 | 1.2% |
| 21.7 | 5 | 1.2% |
| 19.6 | 5 | 1.2% |
| 22 | 5 | 1.2% |
| 21.4 | 5 | 1.2% |
| Other values (200) | 344 |
| Value | Count | Frequency (%) |
| 5 | 2 | |
| 5.6 | 1 | 0.2% |
| 7 | 2 | |
| 7.2 | 3 | |
| 7.4 | 1 | 0.2% |
| 7.5 | 1 | 0.2% |
| 8.3 | 1 | 0.2% |
| 8.4 | 1 | 0.2% |
| 8.5 | 1 | 0.2% |
| 8.7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 50 | 11 | |
| 48.8 | 1 | 0.2% |
| 48.5 | 1 | 0.2% |
| 46.7 | 1 | 0.2% |
| 46 | 1 | 0.2% |
| 45.4 | 1 | 0.2% |
| 44 | 1 | 0.2% |
| 43.8 | 1 | 0.2% |
| 43.5 | 1 | 0.2% |
| 42.8 | 1 | 0.2% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42 | 0.14150 | 0.0 | 6.91 | 0.0 | 0.448 | 6.169 | 6.6 | 5.7209 | 3.0 | 233.0 | 17.9 | 383.37 | 5.81 | 25.3 |
| 1 | 58 | 0.15445 | 25.0 | 5.13 | 0.0 | 0.453 | 6.145 | 29.2 | 7.8148 | 8.0 | 284.0 | 19.7 | 390.68 | 6.86 | 23.3 |
| 2 | 385 | 16.81180 | 0.0 | 18.10 | 0.0 | 0.700 | 5.277 | 98.1 | 1.4261 | 24.0 | 666.0 | 20.2 | 396.90 | 30.81 | 7.2 |
| 3 | 78 | 0.05646 | 0.0 | 12.83 | 0.0 | 0.437 | 6.232 | 53.7 | 5.0141 | 5.0 | 398.0 | 18.7 | 386.40 | 12.34 | 21.2 |
| 4 | 424 | 8.79212 | 0.0 | 18.10 | 0.0 | 0.584 | 5.565 | 70.6 | 2.0635 | 24.0 | 666.0 | 20.2 | 3.65 | 17.16 | 11.7 |
| 5 | 160 | 1.27346 | 0.0 | 19.58 | 1.0 | 0.605 | 6.250 | 92.6 | 1.7984 | 5.0 | 403.0 | 14.7 | 338.92 | 5.50 | 27.0 |
| 6 | 185 | 0.06047 | 0.0 | 2.46 | 0.0 | 0.488 | 6.153 | 68.8 | 3.2797 | 3.0 | 193.0 | 17.8 | 387.11 | 13.15 | 29.6 |
| 7 | 101 | 0.11432 | 0.0 | 8.56 | 0.0 | 0.520 | 6.781 | 71.3 | 2.8561 | 5.0 | 384.0 | 20.9 | 395.58 | 7.67 | 26.5 |
| 8 | 268 | 0.54050 | 20.0 | 3.97 | 0.0 | 0.575 | 7.470 | 52.6 | 2.8720 | 5.0 | 264.0 | 13.0 | 390.30 | 3.16 | 43.5 |
| 9 | 173 | 0.09178 | 0.0 | 4.05 | 0.0 | 0.510 | 6.416 | 84.1 | 2.6463 | 5.0 | 296.0 | 16.6 | 395.50 | 9.04 | 23.6 |
Last rows
| df_index | CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 394 | 448 | 9.32909 | 0.0 | 18.10 | 0.0 | 0.7130 | 6.185 | 98.7 | 2.2616 | 24.0 | 666.0 | 20.2 | 396.90 | 18.13 | 14.1 |
| 395 | 335 | 0.03961 | 0.0 | 5.19 | 0.0 | 0.5150 | 6.037 | 34.5 | 5.9853 | 5.0 | 224.0 | 20.2 | 396.90 | 8.01 | 21.1 |
| 396 | 133 | 0.32982 | 0.0 | 21.89 | 0.0 | 0.6240 | 5.822 | 95.4 | 2.4699 | 4.0 | 437.0 | 21.2 | 388.69 | 15.03 | 18.4 |
| 397 | 203 | 0.03510 | 95.0 | 2.68 | 0.0 | 0.4161 | 7.853 | 33.2 | 5.1180 | 4.0 | 224.0 | 14.7 | 392.78 | 3.81 | 48.5 |
| 398 | 393 | 8.64476 | 0.0 | 18.10 | 0.0 | 0.6930 | 6.193 | 92.6 | 1.7912 | 24.0 | 666.0 | 20.2 | 396.90 | 15.17 | 13.8 |
| 399 | 255 | 0.03548 | 80.0 | 3.64 | 0.0 | 0.3920 | 5.876 | 19.1 | 9.2203 | 1.0 | 315.0 | 16.4 | 395.18 | 9.25 | 20.9 |
| 400 | 72 | 0.09164 | 0.0 | 10.81 | 0.0 | 0.4130 | 6.065 | 7.8 | 5.2873 | 4.0 | 305.0 | 19.2 | 390.91 | 5.52 | 22.8 |
| 401 | 396 | 5.87205 | 0.0 | 18.10 | 0.0 | 0.6930 | 6.405 | 96.0 | 1.6768 | 24.0 | 666.0 | 20.2 | 396.90 | 19.37 | 12.5 |
| 402 | 235 | 0.33045 | 0.0 | 6.20 | 0.0 | 0.5070 | 6.086 | 61.5 | 3.6519 | 8.0 | 307.0 | 17.4 | 376.75 | 10.88 | 24.0 |
| 403 | 37 | 0.08014 | 0.0 | 5.96 | 0.0 | 0.4990 | 5.850 | 41.5 | 3.9342 | 5.0 | 279.0 | 19.2 | 396.90 | 8.77 | 21.0 |